Conventional and frugal methods of estimating COVID-19-related excess deaths and undercount factors

Across the world, the officially reported number of COVID-19 deaths is likely an undercount. Establishing true mortality is key to improving data transparency and strengthening public health systems to tackle future disease outbreaks. In this study, we estimated excess deaths during the COVID-19 pandemic in the Pune region of India. Excess deaths are defined as the number of additional deaths relative to those expected from pre-COVID-19-pandemic trends. We integrated data from: (a) epidemiological modeling using pre-pandemic all-cause mortality data, (b) discrepancies between media-reported death compensation claims and official reported mortality, and (c) the “wisdom of crowds” public surveying. Our results point to an estimated 14,770 excess deaths [95% CI 9820–22,790] in Pune from March 2020 to December 2021, of which 9093 were officially counted as COVID-19 deaths. We further calculated the undercount factor—the ratio of excess deaths to officially reported COVID-19 deaths. Our results point to an estimated undercount factor of 1.6 [95% CI 1.1–2.5]. Besides providing similar conclusions about excess deaths estimates across different methods, our study demonstrates the utility of frugal methods such as the analysis of death compensation claims and the wisdom of crowds in estimating excess mortality.


Methods
We adopted a multi-method approach to estimate COVID-19-related excess deaths in Pune from March 2020 to December 2021 by combining estimates from three methods: (a) statistical and epidemiological modeling with pre-pandemic mortality data, (b) analyzing media reports about discrepancies between official mortality data and death compensation claims, and (c) wisdom of crowds public surveying.Within statistical and epidemiological methods, we used three models: (a) a simple averaging technique 18,87 , (b) the Farrington surveillance algorithm 20 , and (c) an overdispersed Poisson model 88 .Multi-method approaches help mitigate the flaws and biases inherent to any particular method.Piecing together data from different sources improves our understanding of the pandemic 10,16,92,93 .Different methods often reflect different approaches to answering the same question, and thus may produce conflicting estimates.Rather than identifying any single "best" method, multi-method approaches combine diverse sources to produce a collective estimate that is typically more accurate than estimates from individual models.Combining estimates minimizes the pitfalls of relying on any particular individual model, and it can offset statistical bias, potentially canceling out overestimation and underestimation [94][95][96] .More broadly, multi-method approaches reflect an epistemic commitment to diverse viewpoints 97 .They highlight how the voice of diverse stakeholders may be critical to establishing the ground truth 98 .This is especially relevant in the context of COVID-19 where considerable debate exists about officially reported mortality figures 3,12,19,[99][100][101][102][103]119 . Next we briefly describe various methods used in this study to estimate COVID-19-related excess deaths in Pune.All methods were carried out in accordance with relevant guidelines and regulations.All experimental protocols were approved by Carnegie Mellon University's Institutional Review Board (Registration No.: IRB00000352).

Statistical modeling with pre-pandemic all-cause mortality data
To estimate COVID-19-related excess mortality, researchers conventionally use various statistical modeling techniques ranging from simple averaging and linear regression to more sophisticated methods such as Monte-Carlo simulations, Poisson models, and other machine learning models 4,10,11,14,87,104 .Other researchers have estimated excess deaths by extrapolating from more traditional epidemiological measures such as serosurveillance data, infection fatality rate, the overall population's susceptibility to the virus, the protection offered by vaccination, and the chances of reinfection 10,16,87,106 .Most statistical and epidemiological models computed excess deaths by estimating the number of expected deaths based on pre-pandemic trends 5 .However, different models varied widely in their assumptions and choice of relevant real-world parameters.Some researchers used simple averaging techniques to establish a baseline of expected all-cause mortality 18,87 .Although useful, such simple approaches lack flexibility and robustness because they ignore real-world factors including seasonality, population growth, and contemporary trends of mortality.Epidemiologists addressed these limitations using more sophisticated methods such as widely adopted Poisson and quasi-Poisson models that include parameters such as population growth, seasonality, and recent temporal trends of mortality 4,7,9,20,24,105 .Such models trace their roots to the "classical" Farrington surveillance algorithm that has been extensively used across diverse public health settings over the past three decades 20,[106][107][108] .This approach remains a reference point for many of the improved and extended Poisson-related models that have since been developed 109,110 .
Our three statistical models used a dataset about monthly all-cause mortality in Pune from January 2014 through December 2021 (Fig. 2).This dataset was provided to the Jnana Prabodhini Foundation by the Pune Knowledge Cluster, a national-level Science and Innovation Cluster set up by the Office of the Principal Scientific Advisor, Government of India 91 .A formal memorandum of understanding (MoU) of institutional collaboration was signed between the Jnana Prabodhini Foundation and the Pune Knowledge Cluster to ensure responsible data-sharing and upholding privacy standards.The Pune Knowledge Cluster ultimately obtained this dataset from the Pune Municipal Corporation Health Office's death certificate registration data.Besides estimating excess deaths (Eq.1), we also computed the undercount factor, the ratio of excess deaths to officially reported COVID-19 death figures (Eq. 2) 17 .Next, we describe the three statistical models we used in this study.

Simple average model
We used a simple nonparametric model 18,87 to compute COVID-19-related excess deaths (Eq.3).The expected deaths for each month from March 2020 to December 2021 were calculated as the mean number of total deaths recorded during that month for the previous six years.We also calculated the associated 95% prediction intervals [μ ± Zσ] where μ is the mean expected estimate and σ is the standard deviation around the predicted estimate.We set Z = 1.96, the 97.5th percentile of a standard normal distribution.Negative values, where observed counts were below the expected thresholds, were set to zero.This method assumes that the number of deaths is effectively constant over time and that the underlying data are independent and identically distributed (i.i.d.).See Supporting Information for further methodological details and an evaluation of model assumptions. (

Farrington surveillance algorithm
We implemented the Farrington surveillance algorithm 20 , a quasi-Poisson regression model that accounts for seasonality (Eq.4), to compute the expected deaths for each month from March 2020 to December 2021.This model was implemented using the surveillance package in the R programming language 105,111 .As is standard practice, the lower bound for the margin of error of the Farrington surveillance algorithm was computed using a one-sided 95% prediction interval.The upper bound was computed using average expected deaths.Negative values, where observed counts were below the expected thresholds, were set to zero 9 .This method assumes that the number of deaths is effectively constant over time.See Supporting Information for further methodological details.
where ɑ and β account for a seasonal variation in deaths, and M is measured in months.

Overdispersed Poisson model
We implemented an overdispersed Poisson model that accounts for population growth in addition to seasonal variation in deaths (Eq.5) 88 to compute the expected deaths for each month from March 2020 to December 2021.This model was implemented using the excessmort package in the R programming language 104 .We obtained estimates about Pune's monthly population from the World Population Review 89 .We report the associated 95% prediction intervals [μ ± Zσ] where μ is the mean expected estimate and σ is the standard deviation around the predicted estimate.We set Z = 1.96, the 97.5th percentile of a standard normal distribution.Negative values, where observed counts were below the expected thresholds, were set to zero.See Supporting Information for further methodological details and an evaluation of model assumptions.
where P M is the population in month M, ɑ M is a gradual trend accounting for the increasing life expectancy, and s M is a seasonal trend accounting for a seasonal variation in deaths.

Analyzing media reports about discrepancies between official mortality data and death compensation claims
Governmental bodies across the world including India's National Disaster Management Authority have implemented ex gratia monetary compensation policies targeted at households who lost family members to COVID-19 101,112 .Such policies often employ liberal definitions of COVID-19 mortality, thus counting some of the COVID-19 deaths that may have been missed for various reasons 3,4,7,8 , such as deaths that had occurred within a month of suffering from COVID-19 as well as the deaths of patients who did not possess positive RT-PCR (reverse transcription-polymerase chain reaction) tests, but nevertheless displayed other indicators of likely COVID-19 infection including positive antibody tests and HRCT (high-resolution computed tomography) chest scans 113 .We analyzed reports from the Times of India 113 , one of India's most-circulated daily newspapers, about the number of COVID-19 death compensation claims filed by households that lost family members to COVID-19.We treated this number as the estimated COVID-19-related excess deaths (Eq.3).We then computed the undercount factor as the ratio between the number of registered COVID-19 death compensation claims and the number of officially reported COVID-19 deaths (Eq.4).Unlike statistical modeling, our analysis of death compensation claims only provides a point estimate of excess deaths.However, to heuristically estimate the margin of error associated with our point estimate, we further computed undercount factors for other cities in Maharashtra.
Together, these cities constitute a fifth of Maharashtra state's population and almost half of Maharashtra's urban population.We calculated the standard error for the undercount factors, thus generating a range of plausible undercount factors for cities in Maharashtra [se = σ/√n where σ is the standard deviation across these cities and n is the number of cities].This standard error was used to compute a 95% confidence interval for Pune [μ ± Z*se] where μ is the estimated undercount factor for Pune.We set Z = 1.96, the 97.5th percentile of a standard normal distribution.The lower and upper bounds of this confidence interval were multiplied by the number of reported COVID-19 deaths to compute plausible lower and upper estimates of excess COVID-19-related deaths in Pune.See Supporting Information for alternative heuristics of computing plausible lower and upper estimates of the undercount factor for Pune.

Wisdom of crowds public surveying
We conducted an online wisdom of crowds survey in Pune to obtain COVID-19-related excess death estimates.Ethics approval for this survey was obtained from Carnegie Mellon University's Institutional Review Board (Registration No.: IRB00000352).Only adults participated in the survey and completed a digital consent form excess deaths = reported COVID-19 death compensation claims (7)  undercount factor = reported COVID-19 death compensation claims reported COVID-19 deaths before proceeding to the survey questionnaire.Thus, we confirm that informed consent was obtained from all participants.We did not collect identifying or potentially identifying information about survey respondents.We deployed the survey from 8 January 2022 to 8 February 2022.Participants responded to the survey hosted on the SurveyMonkey platform (now Momentive) in either Marathi or English.We employed a sample-of-convenience snowball-sampling method and promoted the survey via social media platforms such as WhatsApp and Facebook.280 adult residents of Pune participated in a COVID-19-related Knowledge, Attitudes, and Practices (KAP) survey (Table S2) 27 .Survey respondents were asked COVID-19-related questions including: "As of January 1, 2022, there have been 9,117 COVID-19 deaths in Pune during the pandemic.This data is from official government figures released by Pune Municipal Corporation (PMC).What do you think is the true number of COVID-19 deaths in Pune (as of January 1, 2022)?Please choose a number between 0 and 90,000."The average cognitive estimate obtained from public surveying, that is, the collective guess about the "true number of COVID-19 deaths" was considered to be the number of excess COVID-19-related deaths (Eq.5).We computed the undercount factor as the ratio between the collective cognitive estimate of the speculated true number of COVID-19 deaths and the number of officially reported COVID-19 deaths (Eq.6).We calculated the standard error [se = σ/√n] and used it to compute the 95% confidence interval [μ ± Z*se] where we set Z = 1.96, the 97.5th percentile of a standard normal distribution.

Aggregate estimate
We combined five COVID-19-related excess deaths and undercount factors obtained from different methods: (a) the simple averaging technique, (b) the Farrington surveillance algorithm, (c) the overdispersed Poisson model, (d) analyzing media-reported death compensation claims, and (e) the wisdom of crowds public surveying.We used a simple bootstrap to generate a plausible range of excess deaths and undercount factors for Pune.We first randomly sampled from the distributions generated by each of the five different methods.For all methods except the wisdom of crowds, we conducted sampling assuming a normal distribution.For the wisdom of crowds, we did not have any such assumption and conducted sampling from the raw survey data.We conducted 10,000 iterations of such random sampling with replacement and used the resulting 10,000 means to compute a 95% confidence interval.See Supporting Information for further methodological details.

Results
We used a multi-method approach to compute COVID-19-related excess death estimates in Pune from March 2020 to December 2021 compared to the 74,289 total reported deaths during this time.We also computed the undercount factor in this period, that is, the ratio of estimated excess deaths to the 9,093 officially reported COVID-19 deaths.Table 1 and Fig. 3 present a summary of excess death estimates and undercount factors estimated from all different methods in this study.All estimated expected deaths and excess deaths have been rounded to the nearest 10 to avoid a false sense of precision.First, we used three types of statistical models.Based on the pre-pandemic trends, the simple average model estimated 53,790 expected deaths (95% PI: 41,230-64,230).Therefore, the estimated COVID-19-related excess deaths were 20,490 (95% PI: 10,050-33,050) (Fig. 4A).Compared to the estimated excess deaths, the 9 officially reported COVID-19 deaths were an undercount of 2.3 (95% PI: 1.1-3.6).However, the simple averaging model did not incorporate seasonal variation in deaths.Accounting for seasonal variation, the Farrington surveillance algorithm estimated 65,090 expected deaths (one-sided 95% PI: 54,390-65,090).Therefore, this method revealed 9,200 estimated excess deaths (one-sided 95% PI: 9,200-19,900) with an undercount factor of 1.01 (one-sided 95% PI: 1.01-2.2) (Fig. 4B).In addition to seasonal variation, the overdispersed Poisson model accounted for population growth and estimated 59,110 expected deaths (one-sided 95% PI: 45,200-68,300), implying 15,180 estimated excess deaths (95% PI: 5,990-29,090) with an undercount factor of 1.7 (95% PI: 0.7-3.2) (Fig. 4C).Second, we analyzed media reports about discrepancies between official mortality data and the number of COVID-19 death compensation claims filed by the public.As of January 2022, residents of Pune had filed around 13,000 death compensation claims 113 , which served as the estimated COVID-19-related excess deaths in Pune based on media reports.Compared to the officially reported mortality, this figure was an undercount factor of 1.4.Using the same media reports 113 , we additionally computed excess deaths and undercount factors for other major cities in Maharashtra.Table 2 represents a summary of death compensation claims filed at different major cities in Maharashtra and the resultant undercount factors of COVID-19-related excess deaths.Finally, we used the undercount factors from cities in Maharashtra to compute a 95% confidence interval for Pune.Our analysis of media reports about discrepancies between official mortality data and the number of COVID-19 death compensation claims filed by the public point to an estimated 13,000 excess deaths [95% CI: 6,910-19,100] in Pune from March 2020 to January 2022 (Table 1), implying an undercount factor of 1.4 [95% CI: 0.8-2.1].
Third, we conducted a wisdom of crowds survey to obtain cognitive estimates about pandemic-associated excess mortality.Cognitive estimates for excess deaths were diverse, with a sixth of survey respondents believing the official COVID-19 numbers were in fact an overestimate (Fig. 5).However, the crowd estimated that the true number of COVID-19 deaths in Pune was 18,900 [95% CI: 16,930-20,880], which served as the estimated COVID-19-related excess deaths.In other words, the crowd estimated an undercount factor of 2.1 [95% CI: 1.9-2.3].
Finally, we used a simple bootstrap to combine estimates from different methods and computed an aggregate estimate of COVID-19-related excess deaths in Pune (Fig. S11).Aggregately, our results estimate 14,770 excess deaths [95% CI: 9,820-22,790] in Pune from March 2020 to December 2021, translating to an undercount factor of 1.6 [95% CI: 1.1-2.5].

Discussion
In our case study, we computed COVID-19-related excess death estimates for Pune.To our knowledge, this is the first such effort; therefore, our results provide new information that can inform the public health policy of Pune.Using multiple methods, we estimated 14,770 excess deaths [95% CI: 9,820-22,790] in Pune from March 2020 to December 2021, of which 9,093 were officially counted as COVID-19 deaths.We further calculated the undercount factor, a metric that allowed for easy comparison of the differential impact of the pandemic across diverse geographical regions and socioeconomic groups 2,13,21,113 .We estimated an undercount factor of 1.6 [95%  www.nature.com/scientificreports/health infrastructures are robust and resilient enough to maintain complete and high-quality data, even during acute crisis events such as pandemics.However, this ideal scenario was rarely achieved globally and across major Indian cities, where the estimated undercount factors were around three (Table S1) 10,14,15,22,156 .Even some of the world's best healthcare systems saw undercount factors around 1.5 (Fig. S2 in Supporting Information) 8,113 .Based on our results, Pune's performance in this regard seems comparable to some of the leading healthcare systems across the world, with its public health data recording infrastructure proving to be fairly robust and resilient during the COVID-19 pandemic 115,116 .
In addition to providing novel public health information about Pune, our main goal was to investigate whether diverse methods of estimating pandemic-related excess deaths provided us with accurate and overlapping statistical estimates.We computed COVID-19-related excess deaths and undercount factors from five different methods: (a) the simple averaging technique, (b) the Farrington surveillance algorithm, (c) the overdispersed Poisson model, (d) analyzing media-reported discrepancies between official mortality data and death compensation claims, and (e) the wisdom of crowds public surveying.Despite their limitations, diverse methods-both conventional and frugal-produced excess deaths estimates and undercount factors that were within the margins of error of each other.Results from all models except from the Farrington surveillance algorithm point towards a similar conclusion about the COVID-19-related undercount factor for Pune.These findings can inform Pune's public health policy-for future pandemics or health crises, decision-makers could assume a worst-case scenario and prepare for up to 2.5 times (upper limit of the 95% confidence interval associated with our aggregate estimate) the reported number of pandemic-caused deaths.Our results reinforce the strength of using multi-method approaches to triangulate the true extent of the impact of the COVID-19 pandemic.By combining conventional and novel frugal methods of estimating pandemic-associated excess mortality in a multi-method approach, we minimized the pitfalls of relying on any particular individual method 86,[95][96][97][98]117,118 . Our finings can have important implications, especially in resource-constrained settings, where robust and resilient data infrastructures tend to be lacking or limited, and in contexts where considerable debate exists about the underlying ground truth [1][2][3]12,19,[26][27][28][99][100][101][102][103]119 .Particularly with the COVID-19 pandemic, there are widespread concerns about the accuracy of officially reported COVID-19-related deaths [3][4][5][6][7][8] .Our study adds to Table 2. Discrepancies between filed death compensation claims and officially reported deaths and estimates of undercount factors during the COVID-19 pandemic in major cities in Maharashtra as of January 2022.a Unlike other cities, the number of death compensation claims filed for Pune based on the media report was approximate 114 .www.nature.com/scientificreports/ a growing body of COVID-19-related excess mortality literature that emerged in response to such skepticism about the accuracy of officially reported pandemic casualties [3][4][5]7,[9][10][11][12][13][14][15]22 .Future research efforts could focus on other untapped frugal alternatives such as analyzing discrepancies between COVID-19 cremation counts and officially reported COVID-19 mortality data 157,158 .Our preliminary results from this method for Pune suggest consilience with the other methods we employed in our study (Table S3).However, these preliminary results are based on a temporally restricted dataset about COVID-19 cremation counts, and a more complete dataset is needed to ascertain the robustness of this method.Within our multi-method approach, we employed three conventional statistical and epidemiological models that have been previously widely used to compute COVID-19-related excess mortality. These methods are often onsidered the gold standard of excess mortality estimation because of their interpretability and inclusion of multiple epidemiologically relevant real-world factors including seasonality, population growth, and contemporary trends of mortality  .Therefore, our results from these methods represent important benchmarks to examine the effectiveness of the novel frugal methods we used.However, these conventional statistical and epidemiological models rely on high-quality all-cause pre-pandemic data that is only accessible in robust and transparent public health data recording systems.The performance of these models suffers in the absence of such data.One limitation of our study was the low granularity of our dataset; it included only monthly-not weekly or daily-data.Future research efforts can address this limitation by using high-granularity datasets. Addtionally, although Pune is estimated to have high pre-pandemic death registration coverage 18,120 , our study did not account for fluctuations in death registration coverage during the COVID-19 pandemic.Future work should use indirect proxy estimates of fluctuations in death registration coverage that can be computed from relevant public health and demographic data such as birth registration coverage, the incidence of traffic accidents, and surveillance of other infectious diseases such as AIDS and tuberculosis (Table S4) 18,[121][122][123][124] .
Two of the statistical models we used: a) the simple averaging technique and b) the Farrington surveillance algorithm did not incorporate underlying population data, and therefore can be readily deployed when these data are non-existent or difficult to obtain due to monetary, bureaucratic, and time constraints.An additional strength of the simple averaging technique is its ease of implementation.This method does not require computer programming knowledge, thus increasing its potential for widespread applicability in low-resource and data-scarce settings.Both the simple averaging technique and the Farrington surveillance algorithm assumed that the pre-pandemic number of deaths was effectively constant over time.We assessed this assumption for both models (see Supporting Information).Even though there was a slight yet significant increase in mortality over time (Fig. S4), both models showed relatively robust performance despite this violated assumption (Fig. S5, Fig. S6, Fig. S7, and Fig. S8).Robust model performance depended upon the amount of underlying data usedboth models required monthly data across at least four years.The overdispersed Poisson model incorporated underlying population data to account for fluctuations in mortality rates over time and thus did not assume that the pre-pandemic number of deaths was effectively constant over time.It also accounted for sustained indirect effects that both the simple averaging technique and the Farrington surveillance algorithm lacked the power to detect 88 , thereby offering more flexibility and robustness compared to these two models.Finally, the overdispersed Poisson nature of this model allowed it to capture more variance than predicted by a Poisson model.This makes it well-suited to our dataset of monthly reported all-cause mortality (mean = 2,687; variance = 418,337).
In addition to using statistical and epidemiological models, we also analyzed media reports about discrepancies between official mortality data and death compensation claims.To our knowledge, our study is the first effort to use this frugal method to estimate pandemic-associated excess deaths.The analyses in this method were possible only because of the availability of data about death compensation claims filed by the public under India's ex gratia monetary compensation policy that employed a liberal interpretation of pandemic-associated mortality 101,113,159,160 .However, this policy may have led to somewhat inaccurate estimates of excess mortality due to the submission of fraudulent documents or the double counting of deaths in neighboring jurisdictions 125,159,160 .Nonetheless, this frugal method remains an important component of multi-method approaches to estimating excess COVID-19-related deaths, given the checks and balances implemented by the government to ensure accurate relief disbursement 113 .Future research should use disaggregated and officially verified ex gratia death compensation data to compute more precise estimates of pandemic-associated excess mortality.
Finally, we examined the effectiveness of another frugal method-the wisdom of crowds approach-to estimate COVID-19-related excess mortality.Although this approach has been widely used across multiple realworld domains before, including during the COVID-19 pandemic 126,127 , to our knowledge, this frugal method has not yet been used to estimate COVID-19-related excess mortality. Therefore, ou study provides a novel confirmation of the potential of the wisdom of crowds approach as a complementary tool of frugal fact-finding.However, the results from our wisdom of crowds public survey should be interpreted with caution, because collective cognitive estimates may be biased, sometimes resulting in herding, mob mentality, informational echo chambers, and widespread proliferation of unscientific opinions [128][129][130][131][132][133][134][135][136][137] .Nonetheless, these limitations can be overcome by integrating findings from judgment, decision-making, behavioral economics, and cognitive science that highlight how domain-general psychophysical representations and Bayesian mechanisms may account for many of the systematic mistakes observed in cognitive estimation across many real-world contexts [137][138][139][140][141][142][143][144][145][146][147][148][149]151 .These findings suggest that domain-general processes account for many of the quirks of human estimation, judgment, and decision-making.Accounting for such general psychophysical factors and other cognitive biases can greatly improve the accuracy, robustness, and effectiveness of the wisdom of crowds approach 150 .For example, in our study, we were able to partially mitigate the biases introduced due to social and peer influence 127,128,130,151 by conducting an online, anonymous public survey.In addition to being a non-WEIRD (Western, Educated, Industrialized, Rich, and Democratic) population 152 , our survey sample of adult residents from Pune was diverse in terms of gender, age, native language, occupation, socioeconomic status, and COVID-19 infection history (Table S2).These study participants also displayed heterogeneous COVID-19-related beliefs and behaviors.Thus, www.nature.com/scientificreports/ the diversity, decentralization, and independence of opinions 126 in our sample may have mitigated some of the inaccuracies stemming from demand characteristics and response biases.In our future work, we plan to explore how diverse COVID-19-related psychological perceptions influence cognitive estimates about COVID-19-related deaths, thus adding to a rapidly growing literature about cognitive estimation and the wisdom of crowds.Our findings confirm that, like most other places, officially reported COVID-19 mortality in Pune was an underestimate.These findings highlight the limitations of public health infrastructures in capturing plentiful, high-quality, and timely data during unpredictable black swan events such as the COVID-19 pandemic 153 .To address these limitations, strong health data systems are needed to inform healthcare utilization planning, resource allocation, and policymaking to ensure healthy living and promote well-being for all (UN Sustainable Development Goal 3) 154 .Robust data systems also permit post-mortem evaluations of pandemic mitigation measures including vaccinations and public lockdowns 156 .To prepare for future pandemics, resilient public health systems require sustained material investments in vital infrastructure and medical equipment, as well as the availability of credible, open-source, and high-quality data (UN Sustainable Development Goal 17.19) 154 .The success of these initiatives will depend on both long-term material investments in vital infrastructure and medical equipment, as well as the availability and abundance of credible, open-source, high-quality data.Therefore, governments, think tanks, research universities, non-profits, industry actors, the media, and other relevant stakeholders have an onus to build and maintain robust data collection and storage infrastructures.This will support wider aims of sensitive societal governance, public accountability, and memorialization of one of the largest public health crises the world has collectively faced in over a century 1,2 .

Figure 3 .
Figure 3. Undercount factor computed from COVID-19-related excess deaths in Pune.The margin of error is the 95% PI for the statistical models: simple average, Farrington surveillance algorithm (one-sided), and overdispersed Poisson model.It is the 95% CI for the analysis of death compensation claims from media reports, the wisdom of crowds public surveying, and the aggregate estimate.An undercount factor of 1 represents an ideal scenario where all estimated excess deaths can be attributed to officially reported COVID-19 mortality.

Figure 4 .
Figure 4. Results from three statistical models: A) the simple average model, B) the Farrington surveillance algorithm, C) and the overdispersed Poisson model.The dotted lines show the expected deaths (estimated from the statistical models) in Pune, the green lines show the officially reported all-cause deaths in Pune, and the gray bands show the 95% PI (one-sided for the Farrington surveillance algorithm).

19 deaths Figure 2. Officially
reported monthly all-cause deaths in Pune from January 2014 through December 2021.

Table 1 .
Estimated 113ected deaths, excess deaths, and undercount factors during the COVID-19 pandemic in Pune.The values in the parentheses are the lower and upper bounds of the margin of error associated with each estimate.The margin of error is the 95% PI for the statistical models: simple average, Farrington surveillance algorithm (one-sided), and overdispersed Poisson model.It is the 95% CI for the analysis of death compensation claims from media reports, the wisdom of crowds public surveying, and the aggregate estimate.aUnlikeother cities, the number of death compensation claims filed for Pune based on the media report was approximate113.